original variable
Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
Zhang, Xiaochen, Cai, Yunfeng, Xiong, Haoyi
Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.
Invariant filtering for wheeled vehicle localization with unknown wheel radius and unknown GNSS lever arm
Chauchat, Paul, Bonnabel, Silvรจre, Barrau, Axel
We consider the problem of observer design for a nonholonomic car (more generally a wheeled robot) equipped with wheel speeds with unknown wheel radius, and whose position is measured via a GNSS antenna placed at an unknown position in the car. In a tutorial and unified exposition, we recall the recent theory of two-frame systems within the field of invariant Kalman filtering. We then show how to adapt it geometrically to address the considered problem, although it seems at first sight out of its scope. This yields an invariant extended Kalman filter having autonomous error equations, and state-independent Jacobians, which is shown to work remarkably well in simulations. The proposed novel construction thus extends the application scope of invariant filtering.
Applying Dimensionality Reduction with PCA to Cancer Data
Principal Component Analysis (PCA) is a powerful and well-established data transformation method that can be used for data visualization, dimensionality reduction, and possibly improved performance with supervised learning tasks. In this use case blog, we examine a dataset consisting of measurements of benign and malignant tumors which are computed from digital images of a fine needle aspirate of breast mass tissue. Specifically, these 30 variables describe specific characteristics of the cell nuclei present in the images, such as texture symmetry and radius. The first step in applying PCA to this process was to see if we can more easily visualize separation between the malignant and benign classes in two dimensions. To do this, we first divide our dataset into train and test sets and perform the PCA using only the training data.
Interventions and Counterfactuals in Tractable Probabilistic Models: Limitations of Contemporary Transformations
Papantonis, Ioannis, Belle, Vaishak
In recent years, there has been an increasing interest in studying causality-related properties in machine learning models generally, and in generative models in particular. While that is well motivated, it inherits the fundamental computational hardness of probabilistic inference, making exact reasoning intractable. Probabilistic tractable models have also recently emerged, which guarantee that conditional marginals can be computed in time linear in the size of the model, where the model is usually learned from data. Although initially limited to low tree-width models, recent tractable models such as sum product networks (SPNs) and probabilistic sentential decision diagrams (PSDDs) exploit efficient function representations and also capture high tree-width models. In this paper, we ask the following technical question: can we use the distributions represented or learned by these models to perform causal queries, such as reasoning about interventions and counterfactuals? By appealing to some existing ideas on transforming such models to Bayesian networks, we answer mostly in the negative. We show that when transforming SPNs to a causal graph interventional reasoning reduces to computing marginal distributions; in other words, only trivial causal reasoning is possible. For PSDDs the situation is only slightly better. We first provide an algorithm for constructing a causal graph from a PSDD, which introduces augmented variables. Intervening on the original variables, once again, reduces to marginal distributions, but when intervening on the augmented variables, a deterministic but nonetheless causal-semantics can be provided for PSDDs.
Variable selection with false discovery rate control in deep neural networks
Deep neural networks (DNNs) are famous for their high prediction accuracy, but they are also known for their black-box nature and poor interpretability. We consider the problem of variable selection, that is, selecting the input variables that have significant predictive power on the output, in DNNs. We propose a backward elimination procedure called SurvNet, which is based on a new measure of variable importance that applies to a wide variety of networks. More importantly, SurvNet is able to estimate and control the false discovery rate of selected variables, while no existing methods provide such a quality control. Further, SurvNet adaptively determines how many variables to eliminate at each step in order to maximize the selection efficiency. To study its validity, SurvNet is applied to image data and gene expression data, as well as various simulation datasets.
Optimal whitening and decorrelation
Kessy, Agnan, Lewin, Alex, Strimmer, Korbinian
Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.
Dimension Reduction and Intuitive Feature Engineering for Machine Learning
In the previous parts of this series, we looked at an overview of some popular tricks for feature engineering, and examined those tricks in greater detail. In this part, we continue our closer examination of these approaches with a deeper dive into the final techniques described in Part 1. The examples discussed in this article can be reproduced with the source code and datasets available here. As an analyst, you savor the scenario in which you have a lot of data. But, with a lot of data comes the added complexity of analyzing and making better sense of that data.
Dimension Reduction and Intuitive Feature Engineering for Machine Learning
In the previous parts of this series, we looked at an overview of some popular tricks for feature engineering, and examined those tricks in greater detail. In this part, we continue our closer examination of these approaches with a deeper dive into the final techniques described in Part 1. The examples discussed in this article can be reproduced with the source code and datasets available here. As an analyst, you savor the scenario in which you have a lot of data. But, with a lot of data comes the added complexity of analyzing and making better sense of that data.
Binary Encodings of Non-binary Constraint Satisfaction Problems: Algorithms and Experimental Results
A non-binary Constraint Satisfaction Problem (CSP) can be solved directly using extended versions of binary techniques. Alternatively, the non-binary problem can be translated into an equivalent binary one. In this case, it is generally accepted that the translated problem can be solved by applying well-established techniques for binary CSPs. In this paper we evaluate the applicability of the latter approach. We demonstrate that the use of standard techniques for binary CSPs in the encodings of non-binary problems is problematic and results in models that are very rarely competitive with the non-binary representation. To overcome this, we propose specialized arc consistency and search algorithms for binary encodings, and we evaluate them theoretically and empirically. We consider three binary representations; the hidden variable encoding, the dual encoding, and the double encoding. Theoretical and empirical results show that, for certain classes of non-binary constraints, binary encodings are a competitive option, and in many cases, a better one than the non-binary representation.
Binary Encodings of Non-binary Constraint Satisfaction Problems: Algorithms and Experimental Results
A non-binary Constraint Satisfaction Problem (CSP) can be solved directly using extended versions of binary techniques. Alternatively, the non-binary problem can be translated into an equivalent binary one. In this case, it is generally accepted that the translated problem can be solved by applying well-established techniques for binary CSPs. In this paper we evaluate the applicability of the latter approach. We demonstrate that the use of standard techniques for binary CSPs in the encodings of non-binary problems is problematic and results in models that are very rarely competitive with the non-binary representation. To overcome this, we propose specialized arc consistency and search algorithms for binary encodings, and we evaluate them theoretically and empirically. We consider three binary representations; the hidden variable encoding, the dual encoding, and the double encoding. Theoretical and empirical results show that, for certain classes of non-binary constraints, binary encodings are a competitive option, and in many cases, a better one than the non-binary representation.